Aggregated Estimators and Empirical Complexity for Least Square Regression

نویسنده

  • JEAN-YVES AUDIBERT
چکیده

Numerous empirical results have shown that combining regression procedures can be a very efficient method. This work provides PAC bounds for the L2 generalization error of such methods. The interest of these bounds are twofold. First, it gives for any aggregating procedure a bound for the expected risk depending on the empirical risk and the empirical complexity measured by the Kullback-Leibler divergence between the aggregating distribution ρ̂ and a prior distribution π and by the empirical mean of the variance of the regression functions under the probability ρ̂. Secondly, by structural risk minimization, we derive an aggregating procedure which takes advantage of the unknown properties of the best mixture f̃ : when the best convex combination f̃ of d regression functions belongs to the d initial functions (i.e. when combining does not make the bias decrease), the convergence rate is of order (log d)/N . In the worst case, our combining procedure achieves a convergence rate of order p (log d)/N which is known to be optimal in a uniform sense when d > √ N (see [10, 15]). As in AdaBoost, our aggregating distribution tends to favor functions which disagree with the mixture on mispredicted points. Our algorithm is tested on artificial classification data (which have been also used for testing other boosting methods, such as AdaBoost).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameter Estimation Through Weighted Least-Squares Rank Regression with Specific Reference to the Weibull and Gumbel Distributions

Least squares regression based on probability plots, also called rank regression, can be used to estimate the parameters of some distributions. Regression is performed between a function of the empirical distribution function and the order statistic as the independent variable. Using large sample properties of the empirical distribution function and order statistics, weights to stabilize the va...

متن کامل

Estimation in multiple regression model with elliptically contoured errors under MLINEX loss

This paper considers estimation of the regression vector of the multiple regression model with elliptically symmetric contoured errors. The generalized least square (GLS), restricted GLS and preliminary test (PT) estimators for regression parameter vector are obtained. The performances of the estimators are studied under multiparameter linear exponential loss function (MLINEX), and the dominanc...

متن کامل

Robust Estimation of Multiple Regression Model with Non-normal Error: Symmetric Distribution

In this paper, we develop the modified maximum likelihood (MML) estimators for the multiple regression coefficients in linear model with the underlying distribution assumed to be symmetric, one of Student's t family. We obtain the closed form of the estimators and derive their asymptotic properties. In addition, we demonstrate that the MML estimators are more appropriate to estimate the paramet...

متن کامل

The Ratio-type Estimators of Variance with Minimum Average Square Error

The ratio-type estimators have been introduced for estimating the mean and total population, but in recent years based on the ratio methods several estimators for population variance have been proposed. In this paper two families of estimators have been suggested and their approximation mean square error (MSE) have been developed. In addition, the efficiency of these variance estimators are com...

متن کامل

Short Term Load Forecasting Using Empirical Mode Decomposition, Wavelet Transform and Support Vector Regression

The Short-term forecasting of electric load plays an important role in designing and operation of power systems. Due to the nature of the short-term electric load time series (nonlinear, non-constant, and non-seasonal), accurate prediction of the load is very challenging. In this article, a method for short-term daily and hourly load forecasting is proposed. In this method, in the first step, t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004